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Abstract 

We revisit the problem of lifting real functions to matrix functions via Chebyshev expansions. Given 
a real function / and a matrix with real spectrum A, we analyze the convergence of error bounds for 
the so-called matrix Chebyshev expansion of f(A). The results shed light on the relation between the 
smoothness of / and the diagonalizability of the matrix A, and its effect on the approximation of f(A) 
using the truncated matrix Chebyshev expansion. We present several numerical examples to illustrate 
our analysis and show an application of the matrix Chebyshev expansion for estimating eigenspaces of 
large matrices, associated with certain eigenvalues. 
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1 Introduction 

Chebyshev polynomials are ubiquitous in applied math and engineering science^. These polynomials 

arise as solutions of a Sturm-Liouville ODE and are used for numerous approximation methods, ranging 

from classical PDE methods m. to modern methods for image denoising m- 

We focus on the problem of lifting a real function /: M —)• M to a matrix function /: Atfc(K) —)■ 

where Atfc(]R) is the set of square real matrices of order k having real spectrum. When / is a polynomial, 

such a lifting is straightforward since addition and powers are well-defined for square matrices. When / 

is not a polynomial, there are several standard methods for defining the above lifting. If / is analytic 

with a convergence radius which is larger than the spectral radius of A, then the Taylor expansion 

f{x) = yields f{A) = ^nA"^. If / is not analytic, it required that at least / G C'™', where 

m is the size of the largest Jordan block of A, which allows to define f{A) on each of the Jordan blocks of A. 

This latter approach has several equivalent definitions, see e.g. [51 Chapter 1]. In practice, most standard 

definitions suffer from numerical and computational drawbacks, which limits their applicability. This is 

the case, for example, with definitions that are based on the explicit form of the minimal polynomial of 

A or its Jordan form. 

*nsli aron@math.princeton.edu 
yoelsh@post.tau.ac.il 

^ “Chebyshev polynomials are everywhere dense in numerical analysis”. This quote by Philip Davis and George Forsythe 
is the opening sentence of | 13| . 
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Inspired by the numerical advantages of Chebyshev polynomials in representing and approximating 
scalar functions, we study the use of Chebyshev expansions for matrix functions. The idea of evaluating a 
matrix function by its Chebyshev expansion is not new. In m it is used in spectral methods for solving 
PDEs. In this context, the Chebyshev polynomials are also examined as a special case of ultraspherical 
polynomials [1], and Feber polynomials [20]. Expansion in the Feber polynomials (for complex spectra) is 
also studied for matrices in m- Chebyshev polynomials are also widely used for pseudospectral methods, 
see [22] and reference therein. In all the above papers it is assumed that / is a smooth function. One 
important analytic function is the exponential, used naturally for solving differential equations. In [Tj 
Chebyshev polynomials are proved to be an effective alternative to Krylov techniques for calculating 
exp(A)u, for a given vector v. Another application of matrix Chebyshev polynomials is in representing 
the best matrix 2-norm approximation for analytic functions, over the space of polynomials of a fixed 
degree [T2| . We can also find matrix Chebyshev polynomials in calculating matrix functions of symmetric 
matrices [6|, computing square roots of the covariance matrix of Gaussian random fields [3], facilitating 
the estimation of autoregressive models and in the construction of image denoising operators m- 

Our contribution 

In this paper we generalize the study of matrix Chebyshev expansions to the case where the matrix is not 
necessarily diagonalizable, and where the function is not necessarily analytic. By making these assump¬ 
tions we reveal a trade-off between two factors; “how much” the matrix is far from being diagonalizable, 
as expressed by the size of the largest Jordan block in the Jordan form of the matrix, and “how smooth 
is the function”. Specifically, as the size of the largest Jordan blocks increases (“less diagonalizable”) the 
smoothness required from the function in order to guarantee the convergence of the matrix Chebyshev 
expansion is higher. 

The contribution of the current paper consists of both a theoretical analysis and a practical application. 
In the analysis part we examine convergence issues and prove convergence rates. The convergence rate of 
the matrix Chebyshev expansion is important since it is crucial in determining how many coefficients one 
has to use to approximate f{A) to a prescribed accuracy using a truncated Chebyshev expansion. Thus, 
the convergence rate directly affects the efficiency of algorithms that are based on the matrix Chebyshev 
expansion. Since Chebyshev polynomial are naturally defined on [—1,1], we divide our analysis to two 
cases: a case where there is no a priori information regrading the distribution of the eigenvalues over 
[—1,1], and a case where all non-semisimple eigenvalues are concentrated inside [—1,1]. As expected, 
the latter case implies less restrictions on the smoothness of the function. In particular, if we denote by 
m the size of the largest Jordan block of A, then the convergence of the matrix Chebyshev expansion 
is guaranteed in the first case for any / G (j‘^rn -2 jg q£ i^gunded variation, while in the 

second case it is true for any / G where is of bounded variation. As for convergence rates, we 

summarize our results for the different cases in Tabled] Each raw of the table presents the requirements, 
from both the matrix and the function, for an expansion of length N to have an error bound of cN~^ 
{i > 1). The constant c is independent of N and i but does depend on other factors as mentioned 
under remarks. We note two properties from the table. First, for matrices of concentrated spectrum, the 
smoothness assumptions on / are weaker. Second, one can guarantee a constant which is independent of 
the size of the matrix k by imposing some extra regularity on the function /. Further conclusions and 
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Spectrum of A 

The function 

Remarks 

T is a J-condense matrix 

/ e and /('«+^-i) is of bounded variation 

All non-semisimple 

eigenvalues of A are in [—1 + <5,1 — J] 

c depends on k, f, m 


/ 6 and /(2'"+^-2) is of bounded variation 

c depends on k, /, m 

T is a J-condense matrix 

/ 6 and /('"+20 is of bounded variation 

All non-semisimple 

eigenvalues of A are in [—1 -|- 5,1 — 5] 

c depends on /, m 


/ 6 C2»n+2£ j{ 2 m+ 2 t+l) jg bounded variation 

c depends on / and m 


Table 1: Table of conditions to establish a convergence rate of order N~^, with a constant c, for a truncated 
matrix Chebyshev expansion of length N. The matrix is of size k x k and with m as the size of its largest 
Jordan Block 


their proofs appear in the text. 

In the application part of the paper we demonstrate the efficiency of matrix Chebyshev expansions 
for extracting eigenvectors associated with a specihc predehned segment of the spectrum. This problem 
is usually addressed by inverse power iterations, where one has to solve a linear system at each iteration, 
see e.g., [71 Chapter 8]. Instead, we show that by designing an appropriate hlter function, we can apply a 
truncated Chebyshev expansion and retain fast and accurate results in just a few iterations. This method 
provides a signihcant benefit as the size of the matrix increases. 


The structure of the paper 

The paper is organized as follows. In Section [2] we present our notation and the required mathematical 
background. Section [3] introduces the matrix Chebyshev expansion and our main theoretical results. 
These results include the conditions for convergence of a matrix Chebyshev expansion to the matrix 
function and bounds on the convergence rates. We illustrate numerically some of our theoretical results 
in Section 0] and demonstrate the application of the matrix Chebyshev expansion for the problem of 
subspace recovery in Section [5l 

2 Background and notation 

Chebyshev polynomials of the first kind of degree n are defined as 

T„(3:) = cos (narccos(x)) , xG[— 1,1], n = 0,1,2,... (1) 

These polynomials are solutions of the Sturm-Liouville ordinary differential equation 

(1 - x‘^)y'' - xy + n^y = 0 (2) 
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and consequently satisfy the three term recursion 


Tn{x) = 2xTn-i - Tn- 2 , n = 2, 3,... 


( 3 ) 


with Tq{x) = 1 and Ti{x) = x. Therefore, Chebyshev polynomials form an orthogonal basis for L 2 ([—1,1]) 
with respect to the inner product 


(/) 5)t ~ 




dt. 


vr J_i y/l - t 2 

The Chebyshev expansion for any / with finite norm with respect to (j4]) is 


( 4 ) 


f{x) ~ ^ an[f]Tn{x), Onif] = 

n=0 


( 5 ) 


where the dashed sum 
is dehned as 



denotes that the first term is halved. The truncated Chebyshev expansion 
N 

SN{f)[x) = ^ an[f]Tn{x). (6) 

71=0 


S]sf{f){x) provides a polynomial approximation which is the best least squares approximation with respect 
to the induced norm \ \f\\rp = y/(/, f)j^- Remarkably, this least squares approximation is close to the best 
minimax polynomial approximation, measured by the maximum norm ||/||oq = iiiax 3 ,g[_i i]|/(x)|. The 
following theorem was established by Bernstein [2] back in 1918 (and was known even before, see references 
therein). 


Theorem 2.1 (Bernstein). For any f G C'([— 1,1]) and N > 0 


\\f{x) - S’Ar(/)(x)||^ < AAr||/(a:) -p^(x)||^. 


where p*j^ is the unique best minimax polynomial approximation of degree N, and Xn behaves asymptoti¬ 
cally as log{N) for large N. 

The constant Xn is the Lebesgue constant, and a formula for its exact value has been derived in [18] 
(and in particular, it is less than 6 for N < 1000). Note that on the boundaries of [—1,1] we only consider 
one-sided continuity (and derivatives when required). 

Define the total variation (TV) norm by 

ii^iItv= ll5'ili = j 

Note that this norm can be written in a distributional form instead of the derivative form and thus we 
say that a function has a bounded variation if its TV norm is finite (whether continuous or not, e.g., 
the step function). Using the TV norm it was recently shown that for a smooth /, the uniform error 
||/(x) - S'Ar(/)(x)||^ decays rapidly [23 Chapter 7] 

Theorem 2.2 (Trefethen). Let f G C'™“^([—1,1]), where is absolutely continuous such that 
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f{m) 


< oo. Then, for each N > m, the Chebyshev coefficients satisfy 


TV 

“ N{N - 1) • • • (iV - m) “ {N - m)^+^ 

and moreover 

||/(x) -5^(/)(x)||^ < 

where C = C(f, m) = — 

\J t / Trm 

The trigonometric form m implies that r„(cos(0)) = cos(n0), n = 0,1,2,..., and reveals that 
the Chebyshev expansion of a given f{x), x G [~ljl] coincides with the Fourier cosine series g{6) = 

E /oo 

(^n[f] cos{n6) where g{9) = f{cos{6)) with 6 G [0,7r]. The argument cos{6) in / makes g even 

n=0 

and periodic in 6, which implies that g^^'{dsnT) = 0. Thus, no periodicity is required from / nor from its 
derivatives to ensure that g is differentiable and its Fourier series can be differentiated term by term. We 
conclude the above discussion with the next lemma. 

Lemma 2.3. Let f G 1,1]) he such that /(2m-2) absolutely continuous with of bounded 

variation, m G N. Then, 

1. The series 

OO 

OLn[f]T^^\x), j = 0,1,... ,m - 1, xG[-1,1], 

71=0 

is absolutely convergent. 

2. One can differentiate the Chebyshev expansion term-by-term to have 

OO 

/0)(x) = ^ an[/]r^^)(x), j = 0, l,...,m, xG[-1,1]. 

n .=0 



The proof of Lemma 12.31 is given for completeness in Appendix lA.ll 

3 Matrix Chebyshev expansion 

This section presents the theoretical results of the paper; we define a matrix function via its Chebyshev 
expansion, discuss its convergence, and derive its convergence rate. To this end, unless otherwise stated, we 
assume a given matrix norm||-|| that satisfies the sub-multiplicative property ||AT|| <||A||||y||. Equipped 
with II'll, we restrict the convergence analysis to absolute convergence (in norm). Namely, given a series 
of matrices C we consider the convergence of ||A„|| as n tends to infinity. Note that 

since all matrix norms are equivalent, absolute convergence in one norm implies the absolute convergence 
in any other norm. For more details about convergence of matrix series and matrix norms, we refer the 
reader to uni Chapter 5]. A short introduction to matrix functions and their basic definitions is provided 
for completeness in Appendix [Bj 
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3.1 Definition and convergence 

In the sequel, we denote by Ai,...,Afc G M the eigenvalues of a given matrix A G Alfc(lK) and by 
p(A) = maxj=i^,,, fc|Aj| the spectral radius of A. Since we discuss Chebyshev polynomials, we assume 
p(A) < 1 and that / is a scalar function defined on [—1,1]. Also, we denote by ||v^||f = \/ tr(XX^) the 
Frobenius norm of X. 

We begin by defining the matrix Chebyshev expansion. 

Definition 1 (Matrix Chebyshev expansion). Given a function f on [—1,1], we define the partial sum 

N 

SMif){A)= an[f]Tn{A), 
n=0 

where an[f] is given in In addition, if the limit Sooif){A) is absolutely convergent with respect to 
any matrix norm H-H, we define Soo{f){A) as the matrix Chebyshev expansion, 


Soo{f){A) = hm SN{f){A). 

N^oo 


(7) 


In this subsection, we address two fundamental questions: how can one guarantee the (absolute) 
convergence of 5oo(/)(A)? And, does Soo{f){A) coincide with f{A) when using the standard definitions 
of matrix functions (see Appendix [B])? 

A basic property of polynomials, and specifically the Chebyshev polynomial T„, is that they preserve 
matrix similarity. Namely, A = P~^BP implies Tn{A) = P~^Tn{B)P . Therefore, for a diagonalizable 
matrix A, evaluating Tn{A) reduces to apply Tn{x) on the eigenvalues of A. Thus, for such matrices, the 
convergence of the Chebyshev expansion of a scalar function is inherited by its matrix version. This is 
summarized in the following theorem. 


Corollary 3.1. Let f be a function on [—1,1] having an absolutely convergent Chebyshev series. Then, 
for any diagonalizable matrix A, the matrix Chebyshev expansion of f{A) is convergent. 


Proof. The diagonalizability means that A = Q ^DQ, where D = diag(Ai,..., A^). Therefore, ||T„(A)|| < 
||Q-i||||T„(D)||||Q|| and also|T„(Ai)| < 1, 1 < i < k, see e.g., m, which implies that ||T„(iA)||^ < y/k. 
Thus, we get 


n=0 



^WQWpVkY |«n[/]| < OO- 

72=0 


□ 


For a general matrix A G Alfc(M), we can try a similar argument but with the Jordan form A = Z~^JZ, 
where J = diag (Ji,..., Jp) is a diagonal block matrix with the blocks J\,...,Jp on its diagonal. Since 
applying a polynomial on a block diagonal matrix reduces to applying it to each block separately we get 
that, 

T„(A) = Z“Miag (r„(Ji),... ,r„(Jp))Z, n G N. (8) 

Thus, in order to understand T„(A), it is s to understand T„(Ji), where Ji is a Jordan block of order 
which we denote by ki (see also the explicit form in (|33p l. 
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Lemma 3.2. Let p{x) = X]l=o ^nx'^ be- a polynomial of degree £. Then 

p('=i-i)(Ai) \ 

(ki-iy. 

P'(A*) 

y P(Ai) y 

where Ji is the ki x ki Jordan block, as in m- 

The proof of Lemma 13.21 is given in Appendix IA.2I Obviously, this result must agree with standard 
definitions of matrix functions, see e.g., (j34|) of Definition [5] in Appendix [Bj 

The explicit form of T„(Jj), derived from Lemma 13.21 gives rise to the next convergence result. The 
only additional fact we need for generalizing Corollary 13.11 to non-diagonalizable matrices is a bound on 
the derivatives of the Chebyshev polynomials. In m Chapter 1.5] it is proven that for any j G N 


/ 


p{Ji) = 


P{K) P'(\) 
p{K) 




n^{n^ — 1) • • • (n^ — (j — 1)^) ^ 

(2j - 1) • • • 5 • 3 • 1 - (2j-l)!- 


(9) 


Therefore we have. 


Theorem 3.3. Let A G A4fc(M) he a matrix and let m > 1 be the size of the largest Jordan block in 
the Jordan form of A. Assume that f G 1,1]) such that y( 2 m- 2 ) is absolutely continuous and 

is of bounded variation. Then, the Chebyshev expansion Soo{f){A) is absolutely convergent. 


Proof. It is convenient to prove the theorem using the £i-induced matrix norm, ||Ai||^ = maxj 
whereby we get 

||r„(A)||^ < ^ ^,max^||r„(Jj)||^^ ||Z||^, n G N. (10) 

By the explicit form given in Lemma 13.21 for any Jordan block Ji of order ki 




1 d^ 


-1 




n G N. 


( 11 ) 


Therefore (|lip is bounded by 


ki — l 




< n 


2ki-2 


1 + ^ 2-M < 3n2^*"^ n G N. 


( 12 ) 


j=l \ j=0 

Recall that m = maxj=i^...^p ki and combine (fTOll and (fT^ to get 


Tn(A)||^< ||Z||^ max 3n2^*-2 < 


1 


(13) 


where Ca = 3||Z ^||^||Z||j^isa constant that depends solely on the Jordan form of A and is independent 
of n. By Theorem 12.2[ there exists Cf such that |an[/]| ^ Cfn~‘^"^ for large enough n. Therefore, we 
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have that 

OO OO OO 

^ \\an[f]T^{A)\\^ = l«-[/]|||^-(^)|li < CjCa Y1 (14) 

n=n* n=n* n=n* 

and so Soo{f){A) of d?]) is absolutely convergent. □ 

We next show that Soo{f){A) coincides with the standard definitions for f{A). To answer the second 
question given after Definition [T] we have, 

Theorem 3.4. Let A £ and let f satisfy the conditions for an absolutely convergent matrix 

Chebyshev expansion. Then, Soo{f){A) = f{A), where f{A) is one of the standard definitions for lifting 
a scalar function to matrices, as appear in Appendix O 

Proof. We prove that matrix Chebyshev expansion is equivalent to Definition [5] in Appendix [B] since 
all standard definitions for lifting a scalar function to matrices are equivalent. We use the notation 
of the proof of Theorem 13.31 and by (l8|) deduce that it is enough to show that the definitions coincide 
on the Jordan blocks of A. Note that in the special case where the matrix is normal, as in Corollary 
EH the definitions are equivalent. This is true due to the absolute convergence which implies pointwise 
convergence of the Chebyshev series for numbers, together with the fact that the function / is evaluated 
element-wise on the diagonal form. 

Denote by m the size of the largest Jordan block of A. An element in the matrix Soo{f){Ji), corre¬ 
sponding to the Jordan block Ji and the eigenvalue Aj, is given for every 1 <p < j < m by 


= Y^an[f] 


(15) 


72=0 




72=0 


Note that the absolute convergence of the matrix Chebyshev expansion implies the absolute and uniform 
convergence of the scalar Chebyshev expansion of /. Thus, one can differentiate the Chebyshev series of 
/ term-by-term to have f^A^ j = 0,... ,m — 1 (see Lemma 12.31) . Since Lemma 13.21 states that 

1 d^~P 

we get that for l<p<j<ki<m 




72=0 


dx^-P^ 


{j -pV-' 


which coincides with (j34p of Definition El as required. 


□ 


3.2 Relaxing the smoothness reqnirements 

The proof of Theorem 13.31 relies upon the bound Q that can be improved for cases where the eigenvalues 
of A lie strictly inside the interval [—1,1]. We start by examining the hrst derivative of T„(x), which by 






([U) is T'^ix) = for X = cos(0). Therefore, by [TT] 


Ki^)\ < 


n 


y/l — x' 


|x| < 1, n = 0,1, 2,... . 


(16) 


The factor increases near the end points of [—1,1] and ultimately forces us to use the global bound 

of n^, see e.g., m Chapter 1]. However, for a segment /^^ = [—1 + (5,1 — 5], with a fixed 0 < 5 < 1 
use a linear bound un with 


we can 


u = 


V2d{i-6y 

For higher derivatives, one can use ([2]) to derive [HI Chapter 1, p. 32-33] 

(1 - xyTjy+^\x) - {2k - I)xTW(x) + (n^ - {k - iy)Tjy-^\x) = 0, 1 < A: < n. 


(17) 


Thus, we get 




rf-i)(x) 


jxj < 1, 1 < A; < n. 


(18) 


The latter leads to the following bound on higher order derivatives of Tn{x). 

Lemma 3.5. Let 0 < 5 < 1 and 1 < k < n. Then, the k^^ derivative ofTn{x) satisfies 




< CkU^, |x| < 6, 


where Ck = Ck{i^,k) is a constant independent of n, and v is given by (fT71) . 

Proof. We prove the lemma by induction on the order k of the derivative. For the basis of our induction 


we have 
assume t 


Tn (x) < 1 = {uny. For A: = 1 the lemma holds due to ()16l) and the fact that x € Ig. Now, 
re claim is true for any j that satisfies 0 < j < A; < n, for a fixed k. Then, for A: +1 we get by (|18|) 


a leading order term from the leading order of 


Tiy-^\x) 


. Namely, 


Tn^^\x) <n^v‘^{vny ^ + cn^. 


It is understood from (1181) that the constant c of the lower order term (LOT) depend on k and u but not 
on n. □ 

Remark 1. We did not elaborate on the LOTs of the bound on Tn^^^\x), which appear in the proof as 
cn^. For example, with further induction, one can derive that the constant in front of the term is 
bounded by pj-ggj {g given for completeness in Avvendix \A.tA Nevertheless, the explicit 

form of the constants of the LOTs are less important for us as we aim to pose convergence results, which 
are typically stated for large n. Thus, the explicit forms of the LOTs are omitted. 

Lemma 13.51 shows that in many cases the bound of for T^{x) is very pessimistic and so are the 
conditions of Theorem 13.31 above. We therefore define the class of matrices for which we can relax the 
smoothness requirements from the function. 

Definition 2 ((5-condense matrix). Let 0 < <5 < 1. We call a matrix A £ A^fc(M) with p{A) < 1 a 
5-condense matrix if any eigenvalue in [—1, —(5] n [5,1] is semisimple (its algebraic multiplicity is equal to 
its geometric multiplicity). 
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Having the above definition, we get the next convergence result. 

Corollary 3.6. Let A E be a 6-condense matrix. Denote by m > 1 the size of the largest Jordan 

block in the Jordan form of A, and assume that f G 1,1]) such that is absolutely contin¬ 

uous with of bounded variation. Then, the Chebyshev expansion Soo{f){A) is absolutely convergent. 

Proof. We follow the proof of Theorem 13.31 and modify it as described next. First, using Lemma 13.51 we 
replace ()12p with 


fci-i 


1 


|F„(Jj)||^ < 1 + ^ 




ki-l 


< 1 + V ^ 


-CjU^ < CkiU 


ki-l 


for a constant that depends on ki and iz but is independent of n. Since by definition m = maxj=i^...^p ki, 
we can bound 


Tn{A)\\^<Kn^-\ 


K = max Cfc. 

2 = 1,...,p * 




-1 



Here K \s a. constant that depends only on v and on the Jordan form of A. By Theorem 12.21 the 
assumptions on / ensure that ()14p still holds. □ 


Remark 2. Using the same arguments above, we can also update the conditions of Lemma 1^.31 to f £ 
1,1]) such that is absolutely continuous and /f™-) is of bounded variation, for cases where 

X G [—J, (5] with a positive J < 1. 


3.3 The truncated matrix Chebyshev expansion 

The convergence rates of the Chebyshev expansion for scalar functions is well-studied (see Section [2j) and 
we wish to extend some of the results to the matrix case. In particular, we generalize Theorem 12.21 to the 
case of matrix functions, in light of the additional factors we mentioned in the previous section. 

In the case of a diagonalizable matrix A = Q~^DQ we bound the convergence rate of SN{f){A) based 
on the convergence rate of the scalar function / as 


SN{f){A)-f{A)\ 


< 


Q 


-1 


SN{f){D)-f{D)\\\\Q\\, 


where the Chebyshev expansion on the diagonal matrix D acts element-wise. Next, we focus on more 
general cases where the matrices are allowed to be non-diagonalizable. In the following result we show 
how the smoothness of / is translated to convergence rate. Specifically, given extra I derivatives over the 
required for convergence in Theorem 13.31 implies extra I powers of error decay. 


Theorem 3.7. Let A G and let m > 1 be the size of the largest Jordan block in the Jordan form 

of A. Assume f G 1,1]) such that fi‘^'^-‘^+^) is absolutely continuous and fi‘^^-^+^) is of 

bounded variation. Then, for a given matrix norm||-|| we have 


\SN{f){A)-fiA)\\<C 


{N - {2m-l + i)Y+^' 


N > 2m — 1 -h £, 


where the constant C = C{f,£,m,\\-\\) is independent of N. 
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Proof. Since by Theorem I3..SI and Theorem 13.41 Sr^ (f) (A) converges in norm to f{A) we have 


SN{f){A)-f{A)\ 


Y, OCn[f]Tn{A) 
n=N+l 


which can be bounded by X]^Ar+i|'^n[/]|||7"n(^)||- By the equivalence of norms we have that H-H < 
CnormlMli some Constant Cnorm; and so by (fTHI) 


Tn{A)\\ < C^ormCAU^^-^ 


(19) 


The assumptions on / and Theorem 12.21 imply that 

2V 1 


a 


.1/11 < 


n > 2m — ! + .£, 


with V = 


j(2m-l+£) 


vr n(n — 1) • • • (n — (2m — 1 + £)}' 

Note that can be simplified as 


n 


2m—3 


1 


(n — 1) ■ ■ ■ (n — (2m — 3)) (n — (2m — 2) • • • (n — (2m — 1 + £)) ’ 


which is bounded by 


For n > 2m — 1 + ^ we have 


n 


2m—3 


n — (2m — 3) y \re — (2m — 1 + £) 


1 


1+2 


n 


^ 2m — 3 ^ 2m — 3 

= 1 +-^ < 1 + 


n — (2m — 3) n — (2m — 3) 

2m—3 


£ + 2 


Therefore, we denote C = C'normC'A(^) combine (fT^ and (l20l) to have 


|a„[/]|||r„(^)|| <c 

n=N-\-l 


n=N-\-l 

roo 

< c 

Ja 

c 


1 


i+2 


n — (2m — 1 + i 
dx 

N (x - (2m - 1 + £))i^+^) 


1 


£ + 2j (N - (2m-l+£)Y+^' 


( 20 ) 


□ 


The last proof can be modified to the case of 5-condense matrices. In particular, (jl9j) turns to 
||7"n(^)|| ^ C'normC'ylCm- in™’ Therefore, the requirements from / can be weakened and the remaining 
calculations of the proof hold with m — 1 replacing 2m — 2. We summarize it in the following. 

Corollary 3.8. Let A G A^fc(M) be a 5-eondense matrix and let m > 1 be the size of the largest Jordan 
bloek in its Jordan form. Assume f G C'™“^+^([—1,1]) such that is absolutely eontinuous and 
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j{m+e) hounded variation. Then, for a given matrix norm we have 


||5„(/)(/l) -/(^)|] < N>m + l, 

where the constant C = C{f,£,m,\\-\\) is independent of N. 

Remark 3. There are some intermediate cases between the two cases we presented in Theorem \cl.7\ and 
Corollary \3.^ Specifically, consider A that has an eigenvalue of 1 or —1, which is not semisimple, but 
its associated Jordan block is of size fh with fh < m. That means that the power of n in the bound dED 
is determined by the maximum between m — 1 and 2fh — 2. Then, the smoothness requirements from f 
should be modified accordingly. The details of such a proof can be easily recovered from the la.st section 
and therefore are omitted. 

Theorem \m shows a way to lift convergence rate results from scalars to matrices. However, two 
weaknesses of this approach are hidden in the constant C. First, C has as a factor Ca which consists 
of the condition number of Z (see (I14p b This condition number can be significantly large. Second, C 
depends on the relation between matrix norms, which typically depends on the size of A. 

Inspired by [H], we suggest an alternative approach for lifting convergence rate results from scalars 
to matrices, based on duality theorem for matrix norms, see e.g., [9l Chapter 3]. The bound we get is 
independent of the size of the matrix, but costs an additional smoothness from /. This independency 
means, for example, that for a given / the convergence rate for a matrix consists of one Jordan block is 
the same as for a matrix consisting of many replicates of this block. 

Theorem 3.9. Let A E AJfc(M) and denote by m the size of the largest Jordan block in the Jordan form of 
A. Assume f G 1]) such that j(2m+2^) absolutely continuous and j(2m.+2£+i) of bounded 

variation. Then, for a given matrix norm we have 

||5^(/)(H)-/(H)|| N>i, 

where C is a constant independent of N and k. This constant can be bounded by 


2 

J 



Y! an[f]Tr^iA)T(f+^\t) 

n=0 


dt. 


The proof of Theorem 13.91 is given in Appendix IA.41 where we lift Theorem 12.21 for matrices using an 
auxiliary Chebyshev operator, acting on matrix-valued functions. Similar approaches were introduced in 
[H Chapter 6] and [H] for other types of matrix operators. 

We conclude this section with a variant of Theorem 13.91 for J-condense matrices. 


Corollary 3.10. Let A G A4fc(l^) bs a 5-condense matrix and denote by m the size of the largest Jordan 
block in the Jordan form of A. Assume f G 1,1]) such that fi^+^-^) {g absolutely continuous 

and is of bounded variation. Then, for a given matrix norm we have 

||5Ar(/)(A)-/(A)|| iV>£, 
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where C a constant independent of N and k. 


4 Numerical examples 

We turn to demonstrate numerically the theory devoloped in Section [31 In the context of numerical codes, 
it is worth mentioning the Chebfun software system and resources [5], where the interested reader can 
further study other numerical implementations and issues related to Chebyshev expansions. 

This section is divided into three parts. The first part briefly explains our implementation of matrix 
Chebyshev expansions. The second part demonstrates the evaluation of non-analytic matrix functions 
by matrix Chebyshev expansions. The third part illustrates numerically the convergence properties of 
matrix Chebyshev expansions for Jordan block matrices. 

4.1 Notes on implementation 

The common practice for evaluating Chebyshev expansions starts by computing the expansion coefficients 
based on the discrete orthogonality property of Chebyshev polynomials. This computation is nothing but 
the Clenshaw-Curtis quadrature for (|3|), and thus can be implemented using the Fast Fourier Transform 
(FFT) algorithm, see e.g., [211 Chapter 5.8]. 

Having obtained the expansion coefficients for Sn of dH), evaluating Sn employs Clenshaw’s algorithm 
(see e.g., [Ml Chapter 5.5]), which exploits the three term recursion dH), as described in [Ml p. 193]. Note 
that this algorithm is valid also for matrices since each polynomial consists only of powers of A, which 
means that any two such matrix polynomials of A commute. Commutativity makes scalar algorithms, 
such as Clenshaw’s algorithm, applicable to matrices. The algorithm for evaluating matrix Chebyshev 
expansions is given for completeness as Algorithm [1] in Appendix O 

The entire code, of both this section and the next Section[5l is available online at https : //github. com/nirsharon/ 
The numerical examples were executed on a Springdale Linux desktop equipped with 3.2 GHz i7-Intel 
Core™ cpu with 16 GB of memory, and were run on Matlab 2015a. 

4.2 Lifting non-analytic functions 

An important motivation for using Chebyshev expansions is to calculate the matrix version of non-analytic 
functions. A textbook solution involving diagonalizing A might be computationally infeasible for large 
matrices, and numerically unstable for general matrices. The same holds if A cannot be diagonaliable, 
and its Jordan decomposition is used instead. Thus, we consider the matrix Chebyshev expansion as an 
alternative. 

In our first set of examples, we use three different functions which we apply on symmetric random 
matrices of size 10 x 10. These matrices have, with high probability, only simple eigenvalues (geometric 
multiplicity of one) and are diagonalizable from symmetry. We use this simple setting to demonstrate the 
convergence rates of Subsection 13.31 
The first function we use is 

/i(x) = sign(x)x2 = <^ . (21) 

X otherwise. 
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This function is C^, with jump discontinuity in the second derivative at x = 0. Clearly, no Taylor series 
is available for evaluating fi{A). Since the second derivative is of bounded variation, we expect from 
Theorem 13.71 that the error of the truncated matrix Chebyshev expansion of order N would decay as N~‘^ 
(take m = 1 and j = 1). To numerically validate this theoretical result, we calculate matrix Chebyshev 
expansions with up to 2000 coefficients. /i is an odd function and thus only half of its expansion coefficients 
are nonzero. The error is measured using the spectral norm and given by 


SN{f){A)-f{A)\\/ 



We present in Figure [Tal the coefficients’ decay, the approximation error of the truncated matrix Chebyshev 
expansions, and the theoretical bound (of on a logarithmic scale, as functions of the expansion length 
N. Indeed, we see that the approximation errors fit the theoretical bound. 

We repeat the last experiment with the continuous function 


/2(x) = vR- 


( 22 ) 


The derivative of this function has a singularity at x = 0, nevertheless it is of bounded variation. As such, 
the coefficients’ decay is slower than of /i and so we expect the convergence rate to behave like As 

in the previous example, we show on a logarithmic scale the coefficients’ decay, the convergence rate and 
its theoretical bound. As seen in Figure llbl the numerical convergence rate fits its expected theoretical 
rate. 




Figure 1: Lifting scalar functions to 10 x 10 random matrices: comparisons of the coefficients’ decay, 
the approximation error of the truncated matrix Chebyshev expansion, and the theoretical bound of 
Theorem 13.71 


As a last example for this part, consider 


h{x) 


1 

x2 + 0.25‘ 


(23) 


This function is analytic around zero, but with radius of convergence of 0.5, due to its poles at ±0.5L 
Therefore, the Taylor expansion of fs^x) cannot be used for matrices having eigenvalues above 0.5. The 
matrix that we use is of size 10 x 10, with seven eigenvalues whose magnitudes larger than 0.5, as seen in 
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the bar plot of Figure [2al Since /s has derivatives of any order everywhere on the real line, the coefficients 
decay rapidly and we have to use only 73 coefficients (among them only 37 nonzero) to get double precision 
accuracy. Figure [2b] shows that indeed both the expansion coefficients and the approximation error decay 
rapidly. 



(a) The spectrum of the matrix (b) Convergence rate of the matrix Chebyshev ex¬ 

pansion and decay rate of the Chebyshev coeffi¬ 
cients 


Figure 2: Evaluating /s of (12311 on a matrix of size 10 x 10. 



4.3 Matrix Chebyshev expansion on Jordan blocks 

As an example for applying matrix functions on non-diagonalizable matrices, we investigate the case of 
Jordan blocks. We focus on the two parameters of Jordan blocks: their eigenvalues and their size. 

We begin by considering the function 

fiix) =\x\^-^, xe[-l,l]. (24) 

This function has 3 absolutely continuous derivatives, but the fourth one is not of bounded variation. 
Thus, for Jordan blocks with eigenvalue in (—1,1), Corollary 13.61 guarantees the convergence only for 
m < 3. We evaluate /4 for Jordan blocks associated with the eigenvalue 0.7 and of sizes m = 2, 3 and 4. 
The expansion of for the hrst two Jordan blocks is guaranteed to converge, while for the third it does 
not. The convergence rates of the three expansions are depicted in Figure [3al where the truncation error 
is given as a function of the expansion length. The error axis is in logarithmic scale, and still there is a 
clear difference between the decay rates of the three cases. 

When considering J-condense matrices, for and for Jordan blocks of size 3 (with eigenvalue in 
[—1 -|- J, 1 — J], the convergence is guaranteed, while if the eigenvalues are 1 or —1, the convergence is 
guaranteed only for Jordan blocks of size 2. We plot in Figure !^ the truncation errors for three cases of 
different eigenvalues 0.4, 0.7, and 1. We see that the convergence rate associated with the eigenvalue 0.7 is 
slightly higher than the one of 0.4, which can be explained by the growing constant ()17p and Lemma 13.51 
Also, the errors that are associated with the eigenvalue 1 are large and its matrix Chebyshev expansion 
doesn’t seem to converge. 

The other parameter that affects the convergence rate is the order of the largest Jordan block. In 
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Expansion length 

(a) Three different sizes of Jordan blocks with 
eigenvalue 0.7 



Expansion length 


(b) Three Jordan blocks of order m = 3 correspond 
to the eignevalues 0.4, 0.7, and 1 


Figure 3: Truncation errors for evaluating of (j24p on Jordan blocks as a function of the expansion 
length. 



Figure 4: Convergence rates of /s from (j23p for three different block diagonal matrices. Each block 
corresponds to the eigenvalue 0.5. In the legends are the different orders of the blocks on the diagonal of 
our test matrix. 


particular, we show in Theorem 13.91 that while the size of the largest Jordan block is significant, the 
matrix size should not effect the convergence rate. In order to test this observation numerically, we 
compare three block matrices of order 10, having blocks of varying sizes on the diagonal. These blocks 
have the eigenvalue 0.5 on the diagonal but also 0.5 on subdiagonal entries (instead of traditionally 1). 
This is done to avoid high condition numbers which can obscure the results of this test. The hrst matrix 
consists of a single block of order 10 x 10, the second consists of two blocks of order 5, and the third 
consists of five blocks, each of order 2. We use the function /s of ()23p to allow for large Jordan blocks. 
The results are plotted in Figured! where we notice that the convergence rates depend on the size of the 
largest Jordan block. It is worth noting that we repeat this test with a matrix having on the diagonal 
two replicates of the above matrices to experience exactly identical results, which conhrms numerically 
the observation of Theorem 13.91 
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5 Application of matrix Chebyshev expansions to eigenspaces recovery 


We demonstrate the possibility of using matrix Chebyshev expansions for estimating the eigenspace which 
corresponds to a given eigenvalue of a matrix. This problem is usually addressed by the inverse power 
method [71 Chapter 8], where one uses the power method on {A — \I)~^. This requires to solve at each 
iteration a system of linear equations of the order of the matrix. Thus, for huge matrices the conventional 
inverse power method might be too computationally expensive. 

Recall that since matrix functions preserve similarity, the operation f{A) can be interpreted as filtering 
the spectrum of A. Thus, we propose to use a filter function, which is designed in advance to retain 
eigenspaces that correspond to eigenvalues in a specific segment I C p{A). The ideal filter for this task is 
the characteristic function of I, defined as one on I and zero otherwise. Applying such a function on A 
results in a lower rank matrix A that has, outside its kernel, only the eigenspaces of A that correspond 
to eigenvalues in I. Specifically, the spectrum of A will include only ones, for the eigenvalues in I, and 
zeros otherwise. Note that in order to reveal the eigenvalues themselves, we can either use as a filter the 
characteristic function multiplied by the identity function, or in the case of one eigenvalue, evaluate it by 
computing the product of any eigenvector from the recovered eigenspace with the matrix. 

The main problem with evaluating the ideal filter is its poor smoothness, as it is not even continuous. 
Therefore, we use an alternative continuous approximation defined by 

fix) " ^ - c)| - (25) 

Here, erf is the error function (integration over the Gaussian function), c is the center of I, R is (roughly) 
half of the length of I, and r controls the steepness of the transition from zero to one (the smaller r the 
steepest f{x) is). The function f{x) is smooth except at the center point c, where its first derivative 
has a jump discontinuity due to the absolute value. This discontinuity implies Gibbs phenomenon when 
approximating / with the truncated Chebyshev series. Namely, there is always a fixed magnitude of 
approximation error centered around c. However, approximation errors around c are almost meaningless as 
we are interested only in suppressing eigenspaces that correspond to eigenvalues outside I. An illustration 
of ()25p for different sets of parameters is given in Figure [5l 


—c=0, R=.3, r=.2 — c=.5, R=.2, r=.1^ 



Figure 5: The filter function (j25p for two different sets of parameters 
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We consider a scenario where A is a very large matrix, and assume we only have access to its application 
to any given vector. In addition we assume as apriori knowledge that we have an approximated value 
of the specific eigenvalue whose eigenspace we wish to recover. To make this scenario realistic, we must 
assume there is some gap between this eigenvalue and the rest of the spectrum, for otherwise any known 
variant of the power method would converge very slowly. The parameters of our filter are designed so c 
is the approximated eigenvalue and r and R depend on the gap. As we decrease r (corresponding to a 
smaller gap), the derivatives of / increase, which means we have to use more coefficients in the truncated 
Chebyshev series to obtain the same prescribed accuracy of approximation. 

In the following we describe two examples to demonstrate the use of matrix Chebyshev expansions for 
the application of eigenspace recovery. In our first example, we generate symmetric matrices of varying 
order, up to size 15, 000 x 15, 000, which is (approximately) the maximum size of a dense matrix that 
fits in memory on our machine. Each matrix has a spectrum consists of ones, zeros, and halfs. The goal 
is to reveal the 20 eigenvectors corresponding to the eigenvalue 0.5. We compare two methods for this 
problem: the first method is the inverse power method, implemented by solving a linear system of the 
form [A — XI)x = u with A = 0.5 at each iteration using Matlab’s implementation (the EIG procedure 
with the backslash A\b solver). The second method is our filtering via matrix Chebyshev expansion. 
To make a fair comparison between the two methods, we require the same precision of 10 digits when 
measuring the error by 

k 

'^\\Auj - Xuj\\ . 
i=i 

To reach such a precision using the matrix Chebyshev expansion, we can truncate the expansion at large 
N. However, that means computing many matrix-vector multiplications. A more efficient approach is to 
approximate f{x) using some small N such as N = 10, and to repeat the application of this approximated 
function a few times. In particular, the required precision in our case is reached after seven iterations. 
We initial the iteration with a set of random vectors whose number is at least as large as the dimension of 
the subspace to be recovered. The results of the comparison between the two algorithms is presented in 
Figure [6l where the gap between the two method becomes significant as the size of the matrix increases. 



5000 10000 15000 

Matrix size 

Figure 6: A timing comparison between iterative power method (by Matlab) and eigenspace recovery 
using matrix Chebyshev expansion. Time is measured in seconds, as a function of the matrix size. 
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size 

50,000 

100,000 

500,000 

1,000,000 

Timing (sec.) 

3.617 

8.658 

51.970 

108.258 

Precision 

2.758e-10 

1.660e-ll 

3.677e-ll 

5.190e-ll 


Table 2: Timing (in seconds) of eigenspace recovery for big matrices. 

In our second experiment, we use a special type of matrices which are of the form Q^DQ, where D 
is a diagonal matrix (the real spectrum) and Q is the orthogonal matrix of the discrete cosine transform 
(DCT). Namely, for a vector x, Qx is the discrete cosine transform of type II of x. The advantage of using 
such a matrix for our example is that realizing the vector products Qx and Q~^y = Q"^y for any vectors 
X and y is extremely fast. We examine four sizes of matrices ranging from 50,000 to 1,000,000, each 
matrix with a spectrum constructed as in the first example. Note that such dense, large matrices cannot 
be fitted into the RAM of our machine. We measure the time required for our algorithm to estimate 
the required eigenspace to the precision of ten digits. The results are summarized in Table El indicating 
how fast and efficient it is to use the matrix Chebyshev expansion for this task. Note that solving linear 
systems of these sizes using conventional methods is far from being trivial and thus applying the iterative 
power method is intricate. 


Appendix A Complementary proofs 


A.l Proof of Lemma 12.31 

Proof. By Theorem 12.21 we have that |an[/]| < with a positive constant Cf. By the bound Q, we 


have that 


an[f]Tjf\x) <Mn = Cf , for any j = 0,1,..., m - 1, and thus 




x;m„ 

n=l 


Cf 


oo ^ 

V_^ 


(2j - 1)! ^ n2{m-j)+2 

^ ' n=l 


< OO) j < m — 1. 


(26) 


E /OO / •\ 

arXffPn ix 

n=0 


< oo, as required. 


For the second part of Lemma [2.31 we can use the Weierstrass M-test to conclude uniform convergence 


of 


X] J = 0,1,... ,m - 1. 


n=0 


Moreover, it is clear that 


N 


^'a„[/]rW(x), NeN, j = 


n=0 

is a polynomial of degree N and so obviously differentiable, and thus, we can differentiate it term-by- 
term. □ 
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A.2 Proof of Lemma 13.21 

Proof. Decompose each Jordan block Ji as Ji = A*/ + iV, where N is the ki x ki nilpotent matrix 


'^0 1 ^ 

A= ° . 

1 

V % 

The scalar matrix Ajl commutes with any matrix in (M), and thus one can apply the Binomial formula 
to expand (J*)"' = (A*/ + A)", n G N. Consequently, 


(JiT = 


n\ \n—1 
i 


A? 




I — ^* + 1 ^ 


/n\ \n—1 


A? 


Therefore, for 1 < q < j < ki 




n=0 


= 2^ ar, 

n=0 


n 

.3 - Q 




□ 


A.3 The second leading order term of Tit\x) 

As noted in Remark [H we can derive a bound on the constant in front of the term of the fc-th 

derivative Tn\x). We denote by Bj- the bound and prove by induction that Bk < . For the base of 

the induction, we get from (fTHI) that the constants are equal to 0, 1 and 3, for A; = 1, 2 and 3, respectively. 
Assume that the claim is true for any j that satisfies 0 < j < k < n, for a fixed k. Then, for A: + 1, by 
P8I) we get the terms out of two factors: from the (leading order) term of that is multiplied by 
{2k — and from the term (second leading order) of that is multiplied by Combining 

the two factors leads to 

Bk+i < {2k-iy+^ + u^Bk-i. 

The latter recursion can be solved by repeatedly substituting the above bound to yield 


i=i 


as required for A: + 1. 
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A.4 Proof of Theorem 13.91 


To support the proof of Theorem 13.91 we need to define and derive some properties of an auxiliary 
Chebyshev term-by-term operator, acting from M to Affc(M). 

Definition 3. The term-by-term Chebyshev polynomial of order N for a function f: [—1,1] —)• Affc(M) is 

N 

QNif){x) = l^n[f]Tnix), 

n=0 

where fdn [/] G with (/3„ [f])ij = Un [fix)ij] for all 1 <i,j <k. 

The coefficients {f3n}n=o matrices defined by which is the Chebyshev inner product dH, 

in each entry. One important relation between the Chebyshev term-by-term operator of Definition [3] and 
the truncated Chebyshev series for scalar functions ([6]) is as follows. 

Lemma A.l. Let /: [—1,1] —Atfc(M) be a matrix-valued function such that f{x)ij G C'([—1,1]), for all 
^ < i, j < k. Then, for any fixed B G Affc(M) 




tr (^f{t)B'^'j = tr (^/3n [f{t)] B'^'J . 


In words, the trace and the (Chebyshev) inner product commute. 

Proof. Define h: [—1,1] ^ M by h(t) = tr . The trace operator is linear and since / is continuous 

in each entry so is h. Therefore, an [h] is well-defined and 


Q^n [h] — 


tr 




2 Yli,j=l fi^)i,jBi,jTn{t) 


k 


VT^ 

'‘,T 


dt 


■lj — ^ 

k 

Bijan [f{t)i,j] = tr (^j3n [f{t)] B^ 


*j=i 


□ 


Equipped with Lemma lA.ll we are ready for the first step, where we lift Theorem 12.21 to the case 
of the Chebyshev term-by-term operator for matrix-valued functions. To this end, we use fundamentals 
from matrix duality theorem, stating that for any X G Affc(M) and a matrix norm H-H, there exists a 
matrix Y such that ||A|| = tr(Ay'^) and ||T||^ = 1, where H-H^ is the dual norm defined as 


lAll^ = max |tr(Ay^ )|. 

L 


The duality ensures that (IHI^)^ = IHI and that tr(Ay^) < ||A||||y||^. For more details see |10l Chapter 
5]). 
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Theorem A.2. Let /: [—1,1] —)• Alfc(M) he such that fij e C™ ^([—1,1]) where /I™ is absolutely 
continuous and is of bounded variation for any 1 < i, j < k. Then, 

||/(X) - <3a.(/)W|| < N>m, 

where C is a constant independent of x and N. 

Proof. Denote by Ef^{Q, f){x) = f{x) — QN{f){x) the error matrix at x. By the duality theorem for 
matrix norms, for any given A € Affc(M) and a matrix norm H-H, there exists a matrix B with ||-B||^ = 1 
such that ||A|| = tr(AB^). Now, let xq E [—1,1] be a fixed point and let Bq be the corresponding matrix 
such that 


\EN{Q,f){xo)\\ =tr (^EN{Q,f){xo)B^^ = tr (^f{xo)B^^ - tr (QN{f){xo)B^ 


In other words, ||^Ar(Q,/)(xo)|| = /i(xo) - tr [/]T„(xo)B^j where h{t) = tr (^f{t)B'^'^ (as in 

the proof of Lemma lA.ll) . Note that Tn{xo), 0 < n < N, are scalars, and by the linearity of the trace 
operator 


N 


N 


tr I ^ /3n [/] Tn{xo)B^ I = ^ Tn{xo) tr if] B^'J . 


n=0 


n=0 


^fN 


n=0 


Oir, 


By Lemma fA.il the sum on the right-hand-side of (12711 is equal to 
means that 

N 

\\En{Q, f){xo)\\ = h{xo) - ^ an[h]T„(xo). 


tr 


{fm 


(27) 


Tnixo), which 


(28) 


n=0 


The right hand side of (|28p is the Chebyshev error term of the scalar function h, and therefore, to bound 
it using Theorem 12.21 we need to verify that h{x) satisfies the conditions of that theorem. The regularity 
of h is directly inherited from the components of /, and so 


h(j)(x) = tr (^/f^)(x)B(f^ , j 




Thus, it remains to verify the finiteness of 


where in our case 




D 


hi^) 


TV 


. By the duality theorem. 


tr(AB^ 


< 


B 


\D 


= 1, and consequently. 




TV ^ 


y(m+i)(^) 


B^ 


D 


dt < 


1-1 


f 


(m+l) 


it) 


dt < oo. 


The last inequality can be verify easily using the max norm and the conditions on /jj. Applying Theo¬ 
rem 12.21 to h leads to the required bound due to (l28|) . □ 


There are two important notes regarding Theorem IA.21 First, the constant C of the bound is inde¬ 
pendent of k (the order of the matrix). Nevertheless, the size of the largest Jordan form does have an 
effect and C can be bounded by ^ dt. Second, there are no restrictions on the matrix 
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norm, except of being sub-multiplicative. We preserve these properties also in Theorem 13.91 
Proof of Theorem \3.9i . Let g'- [—1,1] —)■ be a matrix-valued function dehned by 


g{x) = Y! an[f]TniA)Tn{x). (29) 

Note that Theorem 13.31 guarantees absolute convergence of the matrix Chebyshev expansion of /, and 
thus, g is well-defined since 

CX) CX) OO 

Yhn[f]TniA)Tn{x)\\ = ^ hn[f]Tn{A)\\\Tn{x)\ < ^'|| O, [/]r„(A) || < OO. 
n=0 n=0 n=0 


On the one hand, g{l) = f{A) since Tn{l) = 1 for all n = 0,1,2,.... On the other hand, using the 
operator of Definition [3] 

N N 

QN{g){i) = Y^n[g]Tn{i) = Y^^nig], (30) 

n=0 n=0 

where each matrix /3n [ 5 ] is dehned as 


[g]ij 


2 /■! /■! {ET=o^e[m{A)m))^Y-{t) 

I CLL — I (JjL 

TT y_i ^ J-1 Vl - 


Each element of {Yl^=o is bounded by thus, we can use the 

bounded convergence theorem to interchange the integral and sum. Therefore, we have 


Pn [g]ij 



{a,[miAmt))^.T 4 t) 


dt. 


Note that Ti(t) is a scalar and a£[f]T£(A) is independent of the integration variable. Thus, we can 
rearrange the latter equation and use the orthogonality of the Chebyshev polynomials to get 


e=o j-i Vi r 

CX) 

= ^ (a,[/]r,(^))^ . 5,,^ = {animiA))^ . , 


i=0 


where 61 n is the Kronecker delta. Combining the latter equation with (l30]) we get that QAr(fl')(l) = 

E ,n 

an[f]Tn{A) = SN{f){A) and therefore, 

n =0 

||5(i) - giv(5)(i)|| = ||/(^) - ^iv(/)(A)|| . (31) 


It remains to show that the assumptions on / implies that g € so that we can apply Theorem lA.21 

< , and the bound 


on g to get the error bound Tiiis is done by considering the bound Tn\x) 
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m on ||T„(A)||. Then, we have 


Y,\\c^n[f]Tn{A)T(^\x) 

n=l 


< OO, 


0<j<i + l, 


and Lemma 12.31 implies the required smoothness. 


□ 


Appendix B Matrix functions 


Given /: M —>• M we want to evaluate the matrix function f{A), that is, to lift the scalar function / 
to /: Atfc(M) —>• Atfe(M) where Atfc(M) is the set of k x k real matrices having real spectrum. We next 
provide two standard dehnitions for matrix functions. The interested reader is referred to [51 Chapter 1] 
for more details as well as for additional equivalent definitions. 

Definition 4 (Matrix function via Taylor series). Let A £ and assume f is an analytic function 

with radius of convergence exceeding p{A). Then, by Taylor’s formula, one defines 


OO 

fi^) = E 


n=0 


&Ho) 

n! 




(32) 


Note that the absolute convergence of (1321) is guaranteed under the assumptions of Definition 01 


Definition 5 (Matrix function via Jordan form). 
where J is a block diagonal matrix. Denote by Ji 
eigenvalue Xi, namely 



Assume A G AJfc(M) has the Jordan form A = Z ^JZ, 
the Jordan block of order ki x ki, corresponding to the 


Ji = 


\ 


Xi 



(33) 


Also, denote by m = max* ki the largest block size of J. Then, for f having at least m — 1 derivatives on 
[—1,1] one defines 

f{A) = Z-^f{J)Z, 


where f{J) is a block diagonal matrix, whose i-th block f{Ji) is an upper triangular matrix, given by 


fiJi) 


' /(A.) /'(A,) 

/(A*) 


V 


(fci-1)! 


/'(A.) 
/(A.) ) 


(34) 


Dehnitions 0] and [5] coincide when the conditions of Dehnition 0] hold. In addition, there are several 
other equivalent standard dehnitions for f{A), based for example, on Hermite interpolation of the roots 
of the minimal polynomial of A, see e.g. [HI Chapter Ij. 
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Both Definitions m and El have inherent computational drawbacks. In particular, 

1. Definition 0] requires evaluating high order derivatives of the given function. In addition, requiring 
/ to be analytic is in some cases too restrictive. 

2. Definition [5] is suitable for non-analytic functions, but requires the Jordan form of A whose compu¬ 
tation may be unstable due to the condition number of Z, see e.g., [71 Chapter 7]. 


It is worth noting that most of the other equivalent standard definitions of a matrix function are also 
difficult to implement numerically (for an arbitrary given function). For example, evaluating the minimal 
polynomial of a matrix is usually avoided. This reasoning encourages us to give an alternative, equivalent 
approach. 


Appendix C Clenshaw’s algorithm for matrices 


Algorithm 1 Clenshaw’s algorithm for evaluating a truncated matrix Chebyshev expansion 
Input: Matrix A of order k and coefficients 

E tN 

CnTn (A). 

n=0 

1 : T ^2- A- Ik 
2 - d^O 
3: dd ^ 0 

4: for n ^ N downto 2 do 
5: b i — d 

6 : d 2 ■ T ■ d — dd + Cn ■ Ik 

7: dd b 

8: end for 

9: return T ■ d — dd + 0.5 ■ cq ■ Ik 
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